Code: By Leo Rizk | Notebook: Peter Ma
This notebook is a step by step guide in visualizing the outputs from the ML SETI search and how to interact with the outputted datasets. Any concerns regarding the code feel free to reach out.
# Import libs and utils
import os
from typing import List, Tuple
import numpy as np
import matplotlib.pylab as plt
import blimpy as bl
from astropy.time import Time
import qrcode
import moviepy.video.io.ImageSequenceClip
from utils import plot_waterfall, make_waterfall_plots, get_cadence_list, get_cadence_h5files, is_in_rfi_bin, get_hits
from utils import get_num_hits_per_cadence, plot_highlighted_hist, scatter_plot_all_hits, hist_all_hits
from utils import save_visualizer, plot_everything, plot_from_ID, get_rfi_bins, generate_and_save_visualizers
from utils import generate_movie, generate_and_save_visualizers_and_movie
from utils import get_num_hits_above_threshold, get_filtered_num_hits_above_threshold
from utils import Cadence
from utils import Hit
%matplotlib inline
Currently there exists three main secondary filtering stages.
RFI_FILTER=20,000constant
## Constants
First we need to set some constants that will help either filter down the search, or mandatory constants for the size of the windows to show and the telescope name etc.. # CONSTANTS
MAX_IMSHOW_POINTS = (8192, 4096 / 8)
TELESCOPE = 'GBT'
OBS_TIME = '300s x 3'
COLUMN = 52
BAD_FREQS = [[0, 1100], [1200, 1340], [1900, 2000]]
grid = plt.GridSpec(12, 8, wspace=1.2, hspace=0.1)
RFI_FILTER = 20000
We can plot a specific hit found in one specific target list given some confidence threshold we wish to look above. Note that the j var stands for the number of the hit in the list. And that i is the ith index of the cadence list/ the directories
results = '../GBT_pipeline/result_BLPC0'
all_cadences = get_cadence_list(results)
num_hits_list = get_num_hits_per_cadence(all_cadences, BAD_FREQS, results)
i = 115 # cadence
j = 3914 # hit within cadence
cadence = Cadence(all_cadences[i], results, BAD_FREQS)
hit = Hit(cadence, j)
plot_everything(hit, all_cadences, num_hits_list, grid)
We can also plot all the hits above a certain threshold of a specific cadence using the following set of code. Note that t is the threshold value.
# Saving png images for hits at or above a chosen confidence threshold
all_cadences = get_cadence_list(results)
# Picking a specific cadence
sample_cadences = all_cadences[52]
t = 0.999
cadence = Cadence(sample_cadences, results, BAD_FREQS)
for j in range(cadence.num_hits):
if cadence.hits[j][2] >= t:
hit = Hit(cadence, j)
plot_everything(hit, sample_cadences, num_hits_list, output_dirname='visualizers_clean_test_3',
save=True)
plt.close()
We can also look at the distribution of all the hits across the board. First we can look at the total number of hits across all the searched cadences.
# Observing the number of hits per cadence:
tab = ' ' * (14 - len('Cadence index'))
print('Cadence index', tab, 'Number of hits')
for i, num in enumerate(num_hits_list):
tab = ' ' * (14 - len(str(i)))
print(i, tab, num)
# Finding the maximum hits for one cadence:
print('Maximum number of hits for a single cadence:', max(num_hits_list))
# Finding the total number of hits for across all cadences:
print('Total number of hits across all cadences:', np.sum(num_hits_list))
We can count the number of hits above a certain threshold with the t variable
# Investigating the number of hits per cadence at or above a confidence threshold:
THRESHOLD = 0.999999999
tab = ' ' * (16 - len('Cadence index'))
print('Cadence index', tab, 'Number of {}% hits'.format(THRESHOLD * 100))
filtered_num_hits_list = []
for i in range(len(all_cadences)):
cadence = Cadence(all_cadences[i], results, BAD_FREQS)
k = 0
for j in range(cadence.num_hits):
if cadence.hits[j][2] >= THRESHOLD:
k += 1
filtered_num_hits_list.append(k)
tab = ' ' * (16 - len(str(i)))
print(i, tab, k)
# Finding the maximum hits at or above the threshold for one cadence:
print('Maximum number of {}% hits for a single cadence:'.format(THRESHOLD * 100), max(filtered_num_hits_list))
# Finding the total number of hits for across all cadences:
print('Total number of {}% hits across all cadences:'.format(THRESHOLD * 100), np.sum(filtered_num_hits_list))
We can then compute and plot the histograms with the proper cuts in frequency.
plt.subplot(2, 1, 1)
scatter_plot_all_hits(all_cadences, results, BAD_FREQS)
plt.subplot(2, 1, 2)
hist_all_hits(all_cadences, results, BAD_FREQS)
plt.gcf().set_size_inches(15, 11)
plt.show()
results_clean = results
new_cadence_list = get_cadence_list(results_clean)
new_num_hits_list = get_num_hits_per_cadence(new_cadence_list, BAD_FREQS, results_clean)
intervals, abundance = hist_all_hits(new_cadence_list, results_clean, BAD_FREQS)
plt.axhline(y=RFI_FILTER, ls='--', lw=2, color='black', label='Flitering cutoff')
plt.legend()
plt.gcf().set_size_inches(15, 5)
plt.show()
rfi_intervals = get_rfi_bins(intervals, abundance)
Having gotten all the plots regarding the search statistics, we can then look at the choosing what threshold we'd want to select. This will help limit the total number of candidates we'd need to search by eye.
thresholds = np.arange(0.95, 1.001, 0.001)
num_list = []
for threshold in thresholds:
num = get_num_hits_above_threshold(new_cadence_list, results_clean, threshold, BAD_FREQS)
num_list.append(num)
filtered_num_list = []
for threshold in thresholds:
num = get_filtered_num_hits_above_threshold(new_cadence_list, results_clean, threshold, rfi_intervals, BAD_FREQS)
filtered_num_list.append(num)
plt.rc('font', size=12)
plt.plot(thresholds * 100, num_list, 'o', color='red')
plt.grid()
plt.xlabel('Confidence rate threshold (%)')
plt.ylabel('Number of hits at or above threshold')
plt.title('Total number of hits obtained with respect to confidence rate threshold')
plt.xlim(min(thresholds) * 100, max(thresholds) * 100)
plt.ylim(min(num_list), max(num_list))
plt.xticks(thresholds * 100, size=8.5, rotation=45)
plt.yticks(num_list, size=8.5)
# Recreate axes on top and on right for easier reading
plt.twiny()
plt.xlim(min(thresholds) * 100, max(thresholds) * 100)
plt.xticks(thresholds * 100, size=8.5, rotation=45)
plt.twinx()
plt.ylim(min(num_list), max(num_list))
plt.yticks(num_list, size=8.5)
plt.gcf().set_size_inches(15.5, 17)
plt.show()
After applying a threshold filtering, and then applying Frequency cuts, we further remove frequencies where lots of hits are found. This new threshold is set by the constant RFI_THRESHOLD value. This is shown by the green dotted plots.
plt.rc('font', size=12)
plt.plot(thresholds * 100, num_list, 'o', color='red', label='Unfiltered')
plt.plot(thresholds * 100, filtered_num_list, 'o', color='limegreen', label='Filtered')
plt.grid()
plt.legend()
plt.xlabel('Confidence rate threshold (%)')
plt.ylabel('Number of hits at or above threshold')
plt.title('Total number of hits obtained with respect to confidence rate threshold')
ylim = round(max(num_list) + min(filtered_num_list) + 50, -2)
plt.xlim(min(thresholds) * 100, max(thresholds) * 100)
plt.ylim(0, ylim)
plt.xticks(thresholds * 100, size=8.5, rotation=45)
plt.yticks(np.linspace(0, ylim, 51), size=8.5)
# Recreate axes on top and on right for easier reading
plt.twiny()
plt.xlim(min(thresholds) * 100, max(thresholds) * 100)
plt.xticks(thresholds * 100, size=8.5, rotation=45)
plt.twinx()
plt.ylim(0, ylim)
plt.yticks(np.linspace(0, ylim, 51), size=8.5)
plt.gcf().set_size_inches(15.5, 17)
plt.show()
The end goal of these visualizations is to make an informed decision on how to select these filtering parameters when developing the animation product that will be reviewed by the human eye. This will help characterize the size of the human trial and such. To produce the final product we run the following script.
To visualize and save ALL the plots from all the candidates searched above some threshold we have the following code. Once again there are 2 major modes to consider. Filtering on verus Filtering Off for this example we will have filtering OFF for now. In production we'd have it on.
generate_and_save_visualizers(dirname=results, threshold=0.999999, output_dirname='visualizers_clean_test',BAD_FREQS=BAD_FREQS)